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Abstract 

Granger causality, a popular method for determining causal influence between stochastic processes, 
is most commonly estimated via linear autoregressive modeling. However, this approach has a serious 
drawback: if the process being modeled has a moving average component, then the autoregressive model 
order is theoretically infinite, and in finite sample large empirical model orders may be necessary, result¬ 
ing in weak Granger-causal inference. This is particularly relevant when the process has been filtered, 
downsampled, or observed with (additive) noise - all of which induce a moving average component and 
are commonplace in application domains as diverse as econometrics and the neurosciences. By contrast, 
the class of autoregressive moving average models—or, equivalently, linear state space models—is closed 
under digital filtering, downsampling (and other forms of aggregation) as well as additive observational 
noise. Here, we show how Granger causality, conditional and unconditional, in both time and frequency 
domains, may be calculated simply and directly from state space model parameters, via solution of a 
discrete algebraic Riccati equation. Numerical simulations demonstrate that Granger causality estima¬ 
tors thus derived have greater statistical power and smaller bias than pure autoregressive estimators. We 
conclude that the state space approach should be the default for (linear) Granger causality estimation. 


1 Introduction 

Wiener-Granger causality (GC), a powerful method for determining information flow or directed functional 
connectivity between stochastic variables, is based on the premise that cause (a) precedes effect, and (b) 
contains unique information about effect (Wiener, 1956; Granger, 1963, 1969). Since its inception, it has 
steadily gained popularity in a broadening range of fields (see e.g. Seth et ah 2015), due to its data-driven 
nature (few structural assumptions need be made about the data generation process), conceptual simplicity, 
spectral decomposition property and ease of implementation. It is most commonly operationalized in a 
multivariate linear autoregressive (AR) modeling context, to the extent that it is frequently viewed as an 
essentially linear autoregressive method. This is unfortunate, not just because it obscures the nonparametric 
essence of the original idea (indeed GC may be formalized in purely information-theoretic terms (Hlavackova- 
Schindler et ah, 2007; Amblard and Michel, 2011; Barnett and Bossomaier, 2013)), but because the pure 
AR modeling approach frequently sits uncomfortably with the structure of the time series data to which it 
is applied. 

Time series data from diverse application domains, in particular econometrics and the neurosciences, 
often contain a strong moving average (MA) component, which may not be represented parsimoniously by a 
finite order AR model. Apart from any MA component intrinsic to the underlying signal, an MA component 
may be induced in an observed process by common data acquisition, sampling and pre-processing procedures. 
Thus, for example, if a (finite order) AR process is subsampled or aggregated, or observed with additive 
measurement noise, the resultant process will be an autoregressive moving average (ARMA) process (Solo, 
2007). A subprocess of an AR process will also generally be ARMA (Nsiri and Roy, 1993), as will a filtered 
AR process (Barnett and Seth, 2011). In general an ARMA process will have infinite AR model order - but 
of course in finite sample a finite, possibly large, model order must be selected, which will be reflected in 
GC statistics of reduced statistical power and increased bias. 
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In contrast to linear AR models, the class of multivariate ARM A models—or, equivalently (Hannan 
and Deistler, 2012), finite order linear state space (SS) models—is closed under all of the above-mentioned 
operations. While the potential of SS modeling for GC inference has been remarked on (Solo, 2007; Valdes- 
Sosa et ah, 2011; Seth et ah, 2013), a rigorous derivation and demonstration of this potential has so far been 
lacking. Here we show how GC, conditional and unconditional, in both time and frequency domains, may be 
easily derived from SS parameters via solution of a single discrete algebraic Riccati equation (DARE). We 
verify in simulation, for a minimal ARMA process, the increase in statistical power and reduction in bias of 
GC estimated via SS, as compared to AR, modeling. We also discuss potential extensions of the SS approach 
beyond the usual stationary linear scenario, to encompass cointegrated processes, models with time-varying 
parameters, and nonlinear models. State space Granger causal inference stands to significantly enhance 
our ability to identify and understand information flow and causal interactions in a wide range of complex 
dynamical systems. Given the ready availability of efficient and effective state space system identification 
procedures, state space modeling should become the default approach to Granger causal analysis. 


2 State space models 


Our starting point^ is a a discrete-time, real-valued vector stochastic process = [yit y 2 t ■ ■ ■ yntV, —oo < 
t < oo, of observations. The general time-invariant linear SS model without input for the observation process 

Vt is 

= Axf + Ut state transition equation (la) 

= Cxt + Vt observation equation (lb) 


where Xt is an (unobserved) m-dimensional state variable, Ut,Vt are zero-mean white noise processes, C is the 
observation matrix and A the state transition matrix. The parameters of the model (1) are (A, C, Q, R, S), 
where 


Q 

5T 


S 

R 


I Ut [uj vj] I 


( 2 ) 


is the noise covariance matrix. We assume that xt, yt are weakly stationary, which requires that the transition 
equation (la) be stable: the stability condition is Amax(A) < 1, where Amax(A) denotes the maximum of the 
absolute values of the eigenvalues of A. We also assume that R is positive-definite. A process yt satisfying 
the stable SS model (1) also satisfies a stable ARMA model; conversely, any stable ARMA process may be 
shown to satisfy a stable SS model of the form (1) (Hannan and Deistler, 2012). 

Given an SS model in general form (1), we define Zt = E| Xt\ y^_i}, the projection of the state variable 
Xt on the space spanned by the infinite past y^_i = \y\-i yt -2 ■ • observation variable (Hannan 

and Deistler, 2012). It is then not difficult to show (Aoki, 1994) that the innovations et = yt — VT-i} 
constitute a white noise process with positive-definite covariance matrix S = and that in terms of 

the new state variable Zt we have an SS model 


zt+i = Azt + Kst state transition equation (3a) 

yt = CZt + St observation equation (3b) 

for yt, where K is the Kalman gain matrix. The SS model (3) is said to be in innovations form, with 
parameters {A, C, K, S). Note that innovations form constitutes a special case of (1) with Ut = Kst, Vt = St, 
so that Q = K'LK^ 5 = and R = S. 

We can write the state equation (3a) as Zt = {I — Az)~^Kz ■ St where here z represents the back-shift 
operator^, which yields the MA representation for the observation process 

yt = H{z)-St, H{z) = I + C{I-Az)-^Kz (4) 

'^Notational conventions: vector quantities are written in lower-case bold, matrices in upper case. Superscript ‘t’ denotes the 
transpose, conjugate transpose and | • | the determinant of a (complex) matrix; superscript ‘r’ refers to a “reduced” model. 
E{ •} denotes expectation and E{ j •} conditional expectation. 

^In the frequency domain 2 = where —tt < oj < tt is the phase angle (we note that the inverse is sometimes used 

for the back-shift operator, particularly in the signal processing literature). 
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with transfer function H{z). From (3) we have Zt+i = Bzt + Ky^ with B = A — KC, from which we may 
derive the AR representation 


B{z)-yt = et, B{z) = I-C{I-Bz)-^Kz (5) 

where B{z) = H{z)~^ is the inverse transfer function. The model is minimum phase if Xma.x{B) < 1 (a 
stable inverse ARMA model for the process y^ then exists); we assume minimum phase from now on. From 
(4) the cross-power spectral density (CPSD) of the observation variable has the factorization (Wilson, 1972) 


S{z) = H{z)T.H*{z) (6) 

on |z| = 1 in the complex plane. 

Assuming stability, minimum phase and positive-definite observation noise covariance^, the DARE (Lan¬ 
caster and Rodman, 1995) 

P - APA^ = Q- {APC^ +S)iCPC^ +R)-\CPA^ +S^) (7) 

has a unique stabilizing solution for P, and we have (Kailath, 1980) 

S = CPC^ +R (8a) 

K = {APC^ +5)S-^ (8b) 


Given general SS parameters {A, C, Q, R, S) then, corresponding innovations form parameters K, T, may be 
obtained through (8) via solution of (7). 


3 Granger causality 


Granger causality is commonly expressed in terms of prediction error. The full multivariate, conditional and 
spectral theory of GG in an AR framework was consolidated by Geweke (1982, 1984), whose formulation we 
follow here. 

Suppose that an observable process y^ is partitioned into subprocesses: y^ = [y\^ VstV- Within the 
“observable universe of information” represented by the process y^, GG from the y 2 t to y^ (conditional 
on y^^) quantifies the extent to which the past of y 2 t improves prediction of the future of y^ over and 
above the extent to which y^ (along with y^^) already predicts its own future. Now the best prediction 
(in the least-squares sense) of y^, given the entire universe of past information yi_i, is the projection 
VT-i}■ This prediction may be contrasted with the prediction of y^ based on the 

reduced universe of past information where = [yj^ VstY omits y 2 t from the observable information 
set. In Geweke’s formulation, predictive power is quantified by the generalized variances (Wilks, 1932; 
Barrett et ah, 2010) of the associated full and reduced residual errors (innovations), st = yt — E{yj| y^_i} 
and = y^ — E|yj| y^ri} respectively: specifically, Geweke (1984) defines the time-domain GG from y 2 
to yi conditional on y^ (for unconditional GG we may take y^^ to be empty) as 




y2^yi\y3 


= In 


"111 

"111 


(9) 


where S = K{e^eJ} and In a maximum likelihood (ML) framework, the corresponding 

statistic is just the log-likelihood ratio (Neyman and Pearson, 1928, 1933) for the nested AR models asso¬ 
ciated with the projections E{ y^l yJLi} and E{ y^l y^Zi}- It also has a natural interpretation as the rate 
of “information transfer” from process y 2 t to process y^ (Palus et ah, 2001; Barnett et ah, 2009; Barnett 
and Bossomaier, 2013). 

^In fact these conditions may be relaxed somewhat - see e.g. (Chan et ah, 1984; Solo, 2015). 
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Geweke (1982, 1984) goes on to define GC iii the spectral domain, and establishes the 

fundamental frequency decomposition 

^V2^yi\v3^^ j fy2^yi\y3^^ 


In the unconditional case (i.e. y^^ is empty) fy^^y^i^) is given by (Geweke, 1982) 


( 11 ) 


where denotes a partial covariance matrix. The conditional case is somewhat more 

intricate: dehning the process ]^) the frequency-domain causality from ^2 to yi conditional 

on y^ is dehned (Geweke, 1984) as the unconditional causality 


~yTjT^-^(2) (12) 

Let S{z), H{z) and E be respectively the GPSD, transfer function and innovations covariance matrix of the 
process y^. We note hrstly that, since the innovations process is white, 511 ( 2 :) is just the flat spectrum 
E'J^. The AR representation for the reduced process y^ is B^{z) • where B^{z) = H^{z)~^ is the 

inverse transfer function of the reduced model, so we have y^ = B(z) ■ y^, where 


B{z) 


0 


0 BUz) 
I 0 
0 B-,,{z\ 


(13) 


But y^ = H{z) ■ Bt, so that y^ = B{z)H{z) ■ £t and it follows that H{z) = B(z)H{z) and S = E. From 
(11,12), after some matrix algebra, we derive the following explicit expression for conditional GG in the 
frequency domain: 


f y2^yi\y3^^^ 


J:l,-[H^2iz) Hisiz)] 

^2211 ^23|1 

r^r2(^)i 


.^3211 ^33|1_ 

[^r3(^)J 


(14) 


where 


[Hu{z) Hisiz)] = [Bi^(z) 


Bisiz)] 


Hi2iz) 

Hz2{z) 


Hi^{z) 

HM, 


(15) 


Suppose now that we have an SS model in innovations form (3) for the observation process y^, satisfying 
the assumptions of stability, minimum phase and positive-dehnite innovations covariance. The reduced 
process y^ satishes the observation equation: 


yl = C-z, + vl (16) 

where = [C^ and = [e{j £ 31 ]^. Together with the original state transition equation (3a), (16) 
constitutes an SS model for y^. Note that in general this “reduced SS model” will not be in innovations 
form (it is for this reason that we write rather than e^). It is, however, trivially stable (since A is stable) 
and the noise covariance is positive-dehnite (since E is positive-dehnite). By the minimum 

phase assumption, the full process y^ is invertible ARMA, so that the subprocess y^ is also invertible ARMA 
(Nsiri and Roy, 1993) and the reduced SS model is thus minimum phase. Now the reduced Kalman gain 
[which enters into the expression for B'^{z)], and the reduced innovations covariance E*^, may be obtained 
by solving the DARE (7) for the general form SS (3a,16). Thus, given innovations form SS parameters 
(A, C, K, E), GCs in both time and frequency domain, conditional and unconditional, are readily calculated. 
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^y 2 ^yi\y 3 vanishes precisely when coefficients with block indices 12 vanish at all lags in the AR repre¬ 
sentation of the process y^. For an SS model, setting Bi 2 {z) = 0 in (5) yields the necessary and sufficient 
condition 

CiB'^K2 = 0, A: = 0,1,2,... (17) 

By the Cayley-Hamilton Theorem, (17) need be satisfied just for A: = 0,1,..., m — 1. In particular, {B, Ci) 
observable K 2 = 0 while {B,K 2 ) controllable Ci = 0 (the latter case is trivial, since then is 

a pure white noise process uncorrelated with the pasts of y 2 t and y^t). In terms of the MA representation 
(4), performing block-inversion of B{z) = the non-causality condition is found to be Hi 2 {z) — 

Hi 3 {z)H 3 ^{z)~^H^ 2 {z) = 0 in the conditional case, or just Hi 2 iz) = 0 in the unconditional case (in which 
case (17) holds with B replaced by A). 

AR model parameters are generally estimated from data by standard regression techniques such as 
Ordinary Least Squares (OLS) or so-called LWR (Levinson-Wiggins-Robinson) algorithms (Levinson, 1947; 
Whittle, 1963; Wiggins and Robinson, 1965; Morf et ah, 1978), which yield pseudo-ML estimates (Liitkepohl, 
2005). In this study we are not primarily concerned with identification of SS models, for which a large 
literature exists (Kailath, 1980; Ljung, 1999). We note nonetheless that many identification procedures 
yield (asymptotically) pseudo-ML estimates for model parameters. State space-sub space (SS-SS) algorithms, 
in particular, do not require iteration and are highly efficient (van Overschee and de Moor, 1996). See 
Appendix A for further discussion on GC estimation and statistical inference for AR and SS models; the 
key point is that SS-SS GC estimation is in general more computationally efficient and numerically stable 
than AR GC estimation. 


4 Simulation experiment 


To examine the performance of state space Granger-causal inference, we generated and analyzed simulated 
time series data using a minimal AR process for which GC values can be computed analytically. The AR 
data was then hltered to induce a MA component. We compare the statistical power and bias of SS-derived 
and AR-derived Granger-causal estimators (see Appendix B) for the resulting ARMA process. 

To generate simulated time series, we used the bivariate AR(1) process dehned by (Barnett and Seth, 

2011 ) 


Vit = + cr? 2 ,i-i + £u (18a) 

r? 2 t = br] 2 ,t-i+£ 2 t (18b) 


with residuals eit,e 2 t hd ~ AA(0,1) so that B = I, and |a|, \b\ < 1 so that the model is stable. The transfer 
function is 

_ [(1 - cz{l-az)-^{l-bz)-^] , . 

0 (l- 6 z)-i J 

From ( 6 ) the CPSD for yu is |1 — a^|“^|l — bz\~'^ [l -|- 6 ^ -|- — 2by{t{z)] which factorizes as H^{z)T.’^H’^*(z) 

with H'^{z) of the form (1 — az)~^{l — bz)~^{l — hz), so that yu is ARMA(2,1). We may calculate 

= I (a -|- \/A2 — 452 ) j where A = 1 -|- (20) 

We have J-r] 2 ^rji — InB*^, while causality in the yi —>■ y 2 direction clearly vanishes. 

We filter the process (18) through a binomially weighted moving average filter with transfer function 


G{z) 


(1 + fizY 0 
0 (1 + f2zY 


( 21 ) 


The filter (21) is always causal and stable, and we take max(|/i|, I/ 2 I) < 1 so that it is also minimum phase. 
The filtered series y^ = G{z) ■ 77 ^ is then ARMA(r, 1) and causalities 7/2 —^ Ui and yi —)■ 7/2 are invariant 
(Barnett and Seth, 2011)^. 

■^While Barnett and Seth (2011) state that causality and stability are sufficient for GC invariance, minimum phase is also 
required. We thank Victor Solo (private communication) for pointing this out. 
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Figure 1: AR and SS median model orders plotted against MA order, for the process with reference 
parameters (22,23). 


For our simulation experiment, we set reference parameters 

a = 0.9, 6 = 0.8, c = - l)(e^ - b^) (22) 

(so that ~ with F = 0.02 for a causal and F = 0 for a null (non-causal) model. Reference 

parameters for the filters (21) were 

fi = 0.6, /2 = 0.7 (23) 

The MA order r is allowed to vary. 

To test for power and bias, we obtained empirical distributions for AR and SS estimators Fy^^y-^, for 
both null and causal models. Empirical distributions were based on 10,000 sample simulations of (18) 
of T = 1000 time steps, filtered according to (21), for r = 0 (no MA component) to r = 10 (strong MA 
component). For the AR GC estimates, AR modeling of the filtered time series was by OLS. For each sample 
a model order was estimated using the BIG and Fy^^y-^^ computed from the corresponding autocovariance 
sequence (Barnett and Seth, 2014). For SS modeling, the Ganonical Correlations Analysis (CCA) SS-SS 
algorithm (Larimore, 1983; Bauer, 2005) was used; see Appendix C for details of the model order selection 
procedure. SS GC was then calculated according to (9) using a estimate obtained by solution of the 
corresponding DARE as described in the previous section. In both the AR and SS cases, model orders 
were virtually identical for the null and causal models. EIG. 1 confirms that AR model order increases 
(apparently linearly) more rapidly with increasing MA order than SS model order. 

Empirical null and causal GC distributions were fitted (based on ML parameter estimation) to F dis¬ 
tributions (see Appendix A). Bias, and statistical power at a = 0.05, were calculated from the empirical 
distributions and also using the fitted F CDFs; results for the F fit (FIG. 2) were very similar to the empirical 
results. The key outcome is that both bias and statistical power scale more slowly with increasing strength 
of the MA order for the SS than the AR causal estimators. We remark that the fluctuations in SS bias 
and power in FIG. 2 are not statistical fluctuations - nor is there reason to believe that they are artifacts of 
the experimental procedure (it is, however, not clear whether the SS order estimates are on average truly 
minimal - see e.g. the MA order 7 and 10 points in FIG. 2). 

5 Discussion 

In this paper we have described and experimentally validated a new approach to Granger-causal inference, 
based on state space modeling. This approach offers improved statistical power and reduced bias, when com¬ 
pared to standard autoregressive methods, especially when the data manifest a moving average component. 
Our primary theoretical contribution is to demonstrate that GC for a state space process, conditional and 
unconditional, in time and frequency domains, may be simply expressed in terms of SS model parameters, 
the calculation involving solution of a single DARE. The resulting estimation of GC from empirical time 
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MA order 


Figure 2: Statistical bias (left column), and power at a = 0.05 (right column) of Ty^-^y^ for the process 
with reference parameters (22,23), plotted against MA order, as calculated from F distributions fitted to 
the empirical distributions. 


series data is simpler and more computationally efficient and numerically stable than AR-derived causal 
estimation. Detailed simulation experiments verified the gain in statistical power and reduction in bias 
resulting from SS as compared to AR modeling of an ARM A process. 

These results are important because many application domains furnish time series data likely to feature 
a moving average component, either intrinsic to the data generating process, or arising from common data 
manipulation operations such as subsampling, aggregation, additive observational noise, filtering and sub¬ 
process extraction. One important example, where the use of GC has remained controversial, is in functional 
magnetic resonance imaging (fMRI) of the brain, where the time series are highly downsampled and filtered 
(by hemodynamic processes) reflections of the neural dynamics of interest (Valdes-Sosa et ah, 2011; Handw- 
erker et ah, 2012; Seth et ah, 2015). Other important scenarios where MA components are to be expected 
include climate science and economics. SS-based Granger causal inference is likely to outperform standard 
AR-based inference in these and other similar situations, since the class of ARMA (equivalently linear SS) 
models is closed under these operations. Given the ready availability of easily implementable and computa¬ 
tionally efficient SS subspace algorithms, the state space approach should be the default methodology when 
implementing GG analysis. 

The state space approach to Granger-causal inference also seems amenable to extensions beyond linear, 
stationary and homoscedastic models. For example, subspace algorithms may be adapted for cointegrated 
processes (Bauer and Wagner, 2002) and SS models have been developed for heteroscedastic noise variance 
(Wong et ah, 2006). A powerful feature of the Kalman filter underlying state prediction is that it applies 
to nonstationary systems, which raises the possibility of a systematic approach to GG analysis for systems 
where parameters vary over time. Finally, nonlinear state space systems already constitutes a mature field 
with a large literature. Bilinear SS models in particular form the basis for Direct Gausal Modeling (DGM) 
(Friston et ah, 2003), an approach to causal analysis of coupled dynamical systems sometimes seen as a rival 
to Granger-causal analysis. The state space setting provides a common framework in which to compare GG 
and DGM, helping to systematize statistical approaches to characterization of causal connectivity (Friston 
et ah, 2013). 
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A GC estimation and statistical inference for AR and SS models 


A preliminary stage in model identification is choosing an appropriate model order which avoids under- or 
over-fitting the data. The number of free parameters for an n-variable AR model of order p is pn?, while 
for an SS model with state dimension m it is 2mn (Hannan and Deistler, 2012). Standard likelihood-based 
methods such as the Akaike (AIC) or Bayesian (BIC) information critera (McQuarrie and Tsai, 1998) are 
commonly employed for model order selection. For both AR and SS models, standard expressions are 
known for the likelihood function (Liitkepohl, 2005); in the SS case the likelihood is calculated by the 
Kalman filter (Anderson and Moore, 1979). SS-SS algorithms naturally compute Kalman filter-like states, 
and also facilitate “inline” model selection criteria based on a singular value decomposition (SVD) of a 
certain (suitably weighted) regression matrix /3 (Bauer, 2001; Garcfa-Hiernaux et ah, 2012). These criteria 
may be derived in a single step as part of the parameter estimation procedure, as opposed to ML methods 
which will generally require multiple parameter estimations over a range of model orders. 

In the AR case, a naive estimator for can be formed by performing separate regressions for 

the full and reduced AR models to obtain estimates of the full and reduced covariance matrices S and 
respectively, which appear in the expression (9) for (time domain) GC. However, this is not a recommended 
procedure (Dhamala et ah, 2008b; Barnett and Seth, 2014); for conditional spectral Granger causality 
estimation in particular, independent full and reduced parameter estimates may lead to spurious, or even 
negative causality estimates (Chen et ah, 2006) This problem relates to the fact that the class of finite 
order AR models is not closed under subprocess extraction; even if the full AR model is of finite order, the 
reduced model will generally be ARM A. 

Alternative approaches are to derive reduced model parameters from the CPSD (Dhamala et ah, 2008b, a) 
or autocovariance sequence (Barnett and Seth, 2014), either of which may be calculated directly from the 
full AR parameter estimates. Although these single-regression methods generally have greater statistical 
power, the computations involve various approximations impacting numerical accuracy and stability, and 
are potentially computationally expensive. Computation of Granger causalities from the autocovariance 
sequence involves approximation of the autocovariance to a potentially large number of lags (specihcally if 
autocovariance of the estimated AR model decays slowly) and application of Whittle’s recursive time-domain 
spectral factorization algorithm (Whittle, 1963). Whittle’s algorithm, while generally stable and accurate, 
does not scale particularly well with system size. Computation of causalities from the CPSD involves the 
choice of a potentially high frequency resolution (again depending on autocovariance decay) and spectral 
factorization via Wilson’s iterative frequency-domain algorithm (Wilson, 1972). The latter again scales 
poorly with system size and, although in theory the iteration converges quadratically, may, in the authors’ 
experience, suffer from stability and numerical accuracy issues. See Barnett and Seth (2014) for further 
discussion on these issues. 

In the SS case, Granger causalities could in principle be computed similarly from the autocovariance 
sequence or CPSD. We may calculate that for the innovations form SS model (3), the autocovariance 
sequence Pfc = is given by 

Po = CnC^ + S (24a) 

Pfc = CA^VlC^ + k>0 (24b) 

where H = K{x^xJ} satisfies the discrete Lyapunov equation H = AQ.A^ +KTjK^^ while the CPSD is given 
by (5,6). However, as remarked above, there are drawbacks to these approaches, which are in any case 
unnecessary, since we have shown how reduced SS model parameters, and thence Granger causalities, may 
be calculated directly from full SS model parameters. This computation, as we have seen, involves the 
solution of a single (or two, if the original SS model is not already in innovations form) DARE, for which 
efficient, stable and accurate algorithms exist (e.g. (Arnold and Laub, 1984)), and is thus likely to be less 

®We do not recommend the “partition matrix” technique proposed in Chen et al. (2006) to address this issue, as the algorithm 
is technically flawed and leads to inaccurate results. The flaw relates to an inappropriate use of filtering in place of spectral 
factorization. 
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computationally intensive as well as more stable and numerically accurate than CPSD- or autocovariance- 
based methods. The efficacy of this method is demonstrated in the simulation experiments. 

Regarding statistical inference, if in the AR case the full and reduced models are independently es¬ 
timated then the associated estimator of log-likelihood ratio statistic for a nested AR 

model under the null hypothesis of vanishing causality. As such, from the standard large sample theory 
(Wilks, 1938; Wald, 1943), it has an asymptotic sampling distribution under the null hypothesis as a 
with df = pnin 2 degrees of freedom, where ni,n 2 are the dimensions of yi,y 2 respectively and p the AR 
model order. For estimated from the autocovariance sequence, we have found that the estimator 

is well-approximated by a F distribution (of which the x^ distribution is a special case) with shape pa¬ 
rameter corresponding to (possibly non-integer) df < pnin 2 - See FIG. 3, where empirical null and causal 
AR GC distributions for our simulation experiment were fitted, based on ML parameter estimation, to F 
distributions; the fitted CDFs are indistinguishable from the empirical CDFs at the scale of the figure. We 
have not been able to ascertain a simple formula for d/ in terms of model parameters. 

In the SS case, since the full and reduced SS models are not simply nested, it is not clear how the large 
sample theory might obtain Under the null hypothesis of zero causality, the sampling distribution for an 
estimator of d^y 2 ^y^\y^ obtained by replacing Sn and (the latter derived as described in Section 3) model 
estimates, is again well-approximated by a F distribution. In FIG. 3 fitted GDFs are again indistinguishable 
from the empirical CDFs. 

Lacking known theoretic asymptotic distributions, permutation and bootstrapping (or other surrogate 
data techniques) may be deployed for significance testing and estimation of confidence intervals respectively. 
As regards frequency-domain Granger causality estimators, as far as we are aware no sampling distributions 
are known, even in the AR case. 

B Statistical power and bias for GC estimators 

Statistical power of an estimator is defined as the fraction of true positives (i.e. correct rejections of the null 
hypothesis) obtained at a given significance level. Suppose that the cumulative distribution function (CDF) 
of a GC estimator T given an actual causality F is <h_F(...). To test for statistical significance at level a, we 
would reject the null hypothesis of zero causality when F > Fcrit = *^0^(1 “ “)• Given an actual causality 
F, then, the statistical power is P{F > Fcrit} = 1 — ^(1 — «))• Under the same assumptions, the bias 

of the estimator is defined as ElF} — F. 

C SS-SS modeling and model order selection 

For SS GC estimates, SS modeling was performed using the CCA state space-subspace algorithm (Larimore, 
1983; Bauer, 2005). We note firstly that not much appears to be known about optimal values for the past 
and future horizons for subspace algorithms. Bauer and Wagner (2002) suggest setting both to 2pAic where 
PAic is the AlC-based AR model order estimate (see also (Garcfa-Hiernaux et ah, 2012)). In this study we 
used pbiC) the BIG AR order estimate, which was found to yield better results. 

To estimate model order, we used a “steepest drop criterion” (SDC), defined as SDC(m) = cr^+i — 2a‘^ + 
where cr^ is the mth singular value of the CCA-weighted regression matrix (3 calculated in the SS-SS 
algorithm (Bauer, 2001). The optimal model order estimate is then taken as the value of m which maximizes 
SDC(m). The criterion thus attempts to pinpoint the steepest drop in singular value with increasing state 
space dimension (FIG. 4). We also tested alternative model order selection schemes, including the NIC, SVC 
and IVC of Bauer (2001). SDC estimates were, overall, similar to those of SVC, but proved more robust to 
variation between sample time series. 

^Although the non-causality condition (17) suggests df = mnin 2 degrees of freedom for an appropriate nested model, 
preliminary testing with SS-SS GC estimates indicates that, even for quite long time series, the corresponding distributions 
do not in general furnish a good fit with the empirical distributions. 
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Figure 3; Empirical distribution (CDF) of for the AR(1) process (18) with MA filter (21), with 

reference parameters (22,23) and T = 1000 time steps. Left column: causal model {F = 0.02, black 
vertical line is actual causality, colored vertical lines indicate means), right column: null model {F = 0, 
colored vertical lines indicate critical causalities Fcrit at 95% significance). MA order r increases from top 
to bottom. 
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Figure 4: The SS model order criterion SDC(m) for a sample filtered process (see text for details). 
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